Executing Stream Joins on the Cell Processor

نویسندگان

  • Bugra Gedik
  • Philip S. Yu
  • Rajesh Bordawekar
چکیده

Low-latency and high-throughput processing are key requirements of data stream management systems (DSMSs). Hence, multi-core processors that provide high aggregate processing capacity are ideal matches for executing costly DSMS operators. The recently developed Cell processor is a good example of a heterogeneous multi-core architecture and provides a powerful platform for executing data stream operators with high-performance. On the down side, exploiting the full potential of a multi-core processor like Cell is often challenging, mainly due to the heterogeneous nature of the processing elements, the software managed local memory at the co-processor side, and the unconventional programming model in general. In this paper, we study the problem of scalable execution of windowed stream join operators on multi-core processors, and specifically on the Cell processor. By examining various aspects of join execution flow, we determine the right set of techniques to apply in order to minimize the sequential segments and maximize parallelism. Concretely, we show that basic windows coupled with low-overhead pointer-shifting techniques can be used to achieve efficient join window partitioning, column-oriented join window organization can be used to minimize scattered data transfers, delay-optimized double buffering can be used for effective pipelining, rateaware batching can be used to balance join throughput and tuple delay, and finally SIMD (single-instruction multipledata) optimized operator code can be used to exploit data parallelism. Our experimental results show that, following the design guidelines and implementation techniques outlined in this paper, windowed stream joins can achieve high scalability (linear in the number of co-processors) by making efficient use of the extensive hardware parallelism provided by the Cell processor (reaching data processing rates of ≈ 13 GB/sec) and significantly surpass the performance obtained form conventional high-end processors (supporting a combined input stream rate of 2000 tuples/sec using 15 minutes windows and without dropping any tuples, resulting in ≈ 8.3 times higher output rate compared to an SSE implementation on dual 3.2Ghz Intel Xeon). Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, to post on servers or to redistribute to lists, requires a fee and/or special permission from the publisher, ACM. VLDB ‘07, September 23-28, 2007, Vienna, Austria. Copyright 2007 VLDB Endowment, ACM 978-1-59593-649-3/07/09.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stream Vector Processing Unit: Stream Processing Using SIMD on a General Purpose Processor

Hypothesis Modern scalar processors inefficiently use fetch bandwidth when executing vectorizable code. By augmenting a general-purpose processor with a Stream Vector Processing Unit (SVPU), we can use the fetch bandwidth much more efficiently and achieve speed-ups in performance on vectorizable code. We also believe that by using a hierarchy of register files, this architecture will use memory...

متن کامل

Stream Processing in General-Purpose Processors

To date stream processing has been applied to a variety of special purpose hardware architectures including stream processors, DSP, and graphics engines. We believe that the stream processing programming paradigm will also be a win for general-purpose processors, for executing both applications that have been identified previously for streaming such as media processing, as well as for wider cla...

متن کامل

Dynamic Instruction Stream Editing

DYNAMIC INSTRUCTION STREAM EDITING Marc Corliss E Christopher Lewis This dissertation proposes a novel, cooperative hardware/software mechanism, called DISE (dynamic instruction stream editor), for efficiently transforming programs. DISE transforms programs using programmable instruction macro-expansion. It resides within the processor inspecting every fetched instruction. Based on user-defined...

متن کامل

Analyzing In-Memory Hash Joins: Granularity Matters

Predicting the performance of join algorithms on modern hardware is challenging. In this work, we focus on mainmemory no-partitioning and partitioning hash join algorithms executing on multi-core platforms. We discuss the main parameters impacting performance, and present an effective performance model. This model can be used to select the most appropriate algorithm for different input data-set...

متن کامل

Conditional Techniques for Stream Processing Kernels a Dissertation Submitted to the Department of Electrical Engineering and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

The stream programming model casts applications as a set of sequential data streams that are operated on by data-parallel computation kernels. Previous work has shown that this model is a powerful representation for media processing applications, such as imageand signal-processing, because it captures the locality and concurrency inherent to an application. This careful handling of important ap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007